Learning with Forest Sparsity 



Chen Chen, Yeqing Li and Junzhou Huang * 
Department of Computer Science and Engineering 
University of Texas at Arhngton 



Abstract 



In this paper, we investigate a new model called forest sparsity for sparse learning 
and compressive sensing. It is an extension of standard sparsity when the support set 
of the data is consisted of a series of mutually correlated trees. Forest sparsity exists 
in many practical applications such as multi-contrast MRI, parallel MRI, multispectral 
image and color image recovery. We theoretically prove the benefit of forest sparsity, 
that much less measurements are required for successful recovery in compressive sensing. 
Moreover, a new algorithm is proposed and applied on several applications with forest 
sparsity. All experimental results validate the superiority of forest sparsity. 

1 Introduction 

Sparsity techniques are becoming more and more popular recently in machine learning, 
statistics, medical imaging and computer vision. The data x G is fc-sparse often means 
that X only has exactly k non-zero components with -C A^. Consider a general problem 
for estimating x: 



where A G M™^^ is the system matrix and b G M™ is the observation. There are infinite 
solutions for this problem, if it is underdetermined (m < A^). Without any priors, we could 
not find a better way than checking every solution to determine which one is preferred. 
However, if x is k-sparse, problem ([T| can be solved by many existing algorithms, such as 
lasso [26j and basis pursuit [iOj. Both of them utilize £i norm regularization for the linear 
system. It has been proved that the ii norm regularization can exactly recover the sparse 
data for problem ([T| under some conditions |12]|S]. 

Although ii norm regularization yields sparse solutions, the result could be even better 
if more priors can be utilized. The concept structured sparsity [2][15jjl] arises when data is 
not only sparse, but also organized in structures (e.g. groups, trees, graphs). A well known 
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instance group sparsity or block sparsity [3]|2H] assumes the components of the data are zeros 
or nonzeros in group-wise. £2,1 norm is often chosen as the group sparse penalty because 
it encourages the components in the same group to be zeros or non-zeros simultaneously. 
When one component appears in multiple groups, the problem becomes overlapping group 
sparse regularization |16| . Another example would be the tree sparsity, such as that the 
wavelet coefficients of a natural signal/image yield a binary tree/quadtree. The coefficients 
on the same subtree tend to be zeros or non-zeros simultaneously. Compared to the group 
sparsity, tree sparsity is often approximated by convex programming due to the difficulty 
of overlapping and hierarchical structure [18j[29j[17j. Generally speaking, both the group 
sparsity and tree sparsity belong to graph sparsity, where each components can be viewed 
as a vertex while the connections are edges. 

In this paper, we propose a new sparsity model called forest sparsity. It is a natural 
extension of tree sparsity by assuming that the trees are not independent but mutually 
connected as a forest. We first give the definitions of both tree sparsity and forest sparsity. 
Based on compressive sensing theory, we then prove that for a forest of T k-sparse trees, only 
0{Tk + log{N/k)) measurements are required for successful recovery with high probability. 
That is much less than the bounds of standard sparsity 0{Tk + Tklog{N/k)) and tree 
sparsity 0{Tk + Tlog{N/k)). Finally, we derive an efficient algorithm to optimize the forest 
sparsity model. The proposed algorithm is applied on medical imaging issues such as multi- 
contrast magnetic resonance imaging (MRI), parallel MRI (pMRI), as well as color images, 
multispectral image reconstruction. Numerous experiments demonstrate the advantages of 
forest sparsity over the state-of-the-art models in these applications. 

2 Forest Sparsity 

As mentioned above, both tree sparsity and forest sparsity are extensions of standard sparsity. 
They require that the data is not only sparse, but also follows some special structure. If we 
denote the subspace of the data as Q whose support is in fi, the tree sparse data then can 
be defined as: 

Definition: (Tree sparse data) // a k-sparse data x G can form a tree or can 
be sparsely represented as a tree under one orthogonal sparse basis $, and the k non-zero 
components naturally form a subtree, then it is tree sparse data. That is, Tk = {x = ^^9: 
9\QP = 0, \Q\ = k}, where VL forms a subtree. 

Here, QP means the complement of Vt. This is a more general definition to wavelet tree 
signals [2]. The solution of problem ([T]) constrained by tree sparsity can be more accurate 
since the randomness of the non-zeros elements' positions is significantly reduced. For tree 
sparse data, we say it has the tree sparsity property. Most natural signals or images have 
tree sparsity property, since they can be sparsely represented by the wavelet tree. Specially, 
the wavelet coefficients of a ID signal form a binary tree and those of a 2D image yield a 
quadtree. 

However, for multi-channel tree sparse data (e.g. RGB color images), the tree structure 
could only be utilized independently in all existing methods [18j [29j [17j. Actually, these 
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trees are highly correlated but not independent. Different channels of the data tend to share 
the same support set. This is called the multiple measurement vectors (MMV) problem in 
literature and the data can be jointly recovered [3]. Unfortunately, none of the existing tree 
sparsity methods [18j [29j [17J can jointly recover this type of data. 

Motivated by this limitation, we extend tree sparsity to a more general case forest sparsity. 
A forest in this context is considered as a set of fully connected trees but not individual ones. 
Each tree in the forest has the same number of nodes, edges and the same structure. More 
importantly, all the nodes in the same position across different trees are mutually connected, 
that is, the corresponding values of these nodes are zeros or nonzeros simultaneously. Figure 
[l] shows the forest structure in multi-contrast MR images. Here and later, a set of nodes are 
said to be consistent when their values are zeros or nonzeros simultaneously. If there are T 
sparse trees in a forest, we can define the forest sparse data like the definition of tree sparse 
data. 

Definition: (Forest sparse data) If there are a series of trees that share a common size 
and structure, and the components at the same position across different trees are mutually 
connected, they are defined as a forest. A data is forest sparse only if it is sparse and 
its support set yields a subforest. That is, J^T,k = {xi = ^^di, i = 1,2,...,T,: 6\Q'" = 0, 
\Q\ = Tk}, where Q forms a subforest. 




Figure 1: Forest structure on multi-contrast MR images, (a) Three multi-contrast MR 
images, (b) The wavelet coefficients of the images. Each coefficient tend to be consistent 
with its parent and children, and the coefficients across different trees at the same position, 
(c) One joint parent-child group across different trees that used in our algorithm. 

Similarly, the forest sparse data has forest sparsity property. Each tree in the forest 
depends on all the others. Using forest sparsity as a penalty for problem ([T]), the freedom of 
the subspace Q will be further reduced. Intuitively, better results can be gained. 
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3 Theory of Forest Sparsity 



What kinds of improvement we can achieve from forest sparse data? In this section, we 
theoretically prove the benefit of the forest sparsity based on compressive sensing theory [TT] 
[8j. When it comes to problem Q, A G M™-^^ is a randomly sampling matrix and b G M™' 
is the measurement vector. In compressive sensing, only m measurements are required for 
successful recovery of the sparse data x G with m <^ N . And it assumes that the 
sampling matrix A satisfies the Restricted Isometry Property (RIP)[9j. 

Definition: (k-RIP) For any matrix A G M™-^^ and any integer k < m, for all x in 
the union Qf^, if there exists a constant 6^ > and 

{1 - 6k)\\x\\l < \\Ax\\l < {1 + 6,)\\x\\l 

the matrix A is said to satisfy the k-restricted isometry property with restricted isometry 
constant 5k- 

Qk includes all the subspaces if there is no further constrain on the support set of 
sparse data x. However, when x has some structured sparsity property(e.g. tree sparsity) 
and is in the union of subspaces A, then the k-RIP can be extended to the ^-RIP 

Definition: (^-RIP) For any matrix A G M™^^, for all x in the suhspace A G M^, if 
there exist a constant 5^ > and 

{l-5j,)\\x\\l<\\Ax\\l<{l + 5MA\l 

the matrix A is said to satisfy the A-restricted isometry property with restricted isometry 
constant 5js,. 

The required number of measurements m has been quantified for a matrix A that has 
the A-RIV [7]: 

Theorem 1: (^-RIP) Let A he the union of L subspaces of k dimension in M^. For 
any t > 0, 

2 12 
m> ^(ln(2L) + A;ln — + t) (2) 

then there exists a randomly generated matrix A G M"^^^ satisfy the A-RIP with constant 
c > with probability at least 1 — e~*. 

From (|2]), we could intuitively observe that m can be less by reducing the number of 
subspaces A. It coincides that the result will be improved when more priors are utilized. 
For standard fc-sparse data, there is no more constrain to reduce the number of possible 
subspaces C%. Therefore, 

Corollary: For k-sparse data, the sampling matrix A has the k-RIP with probability 
1 — e^* if the bound for the number of measurements satisfies that m = 0{k + k\og{N / k)) . 

For tree sparse data, the support of x should follow the subtree structure. Then it is 
obviously that VLxree C. Qk and Lxree < Lk- And 
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Lemma 1: For tree sparse data, the sampling matrix A has the Tl-RIP with probability 
1 — e^* if the bound for the number of measurements satisfies that: 

m = 0{k + hg{N/k)) (3) 

For forest sparse data that contains T trees, the bound for the number of measure- 
ment should be T X 0{k + k\og{N/k)) if they are recovered with standard sparsity, which 
is at least 0{Tk + Tklog{N/k)). Instead, some methods model the data with joint spar- 
sity or block sparsity [2] [27]. Although the required measurements then can be reduced to 
0{Tk + klog{N/k)), no tree structure is involved. On the other side, when modeling with 
independent tree sparsity, there is no more constrain among different trees. 

Lemma 2: Modeling the forest data of T trees with independent tree sparsity, the sam- 
pling matrix A has the TT,k-RIP with probability 1 — e~* if the bound for the number of 
measurements satisfies that: 

m = 0{Tk + T\og{N/k)) (4) 

By the definition of forest sparsity, when the positions of the support set for one tree are 
fixed, the positions of the supports on all other trees will be also fixed. Therefore, the 
freedom of subspaces is further limited and the number of measurements obeys: 

Lemma 3: Modeling the forest data of T trees with forest sparsity, the sampling matrix 
A has the J-'t^-RIP with probability 1 — e~* if the bound for the number of measurements 
satisfies that: 

m = 0{Tk + log(A^/A;)) (5) 

The proofs of Lemma 1 to 3 are included in the appendix. Table [T] shows all the mea- 
surement bounds for the J-2-,A;-sparse data with different models. Modeling with standard 
sparsity requires the most measurements as no more structured prior are utilized. It's hard 
to say which one is better between group sparsity and tree sparsity, which depends on T 
and how well the data follow the jointly sparse or tree sparse structure. Forest sparsity 
outperforms all others if the data is forest sparse. 

4 Algorithm 

In this section, we present a convex optimization implementation to approximately solve the 
forest sparsity model. Let be the forest sparse penalty of T trees. We extend the 

standard sparsity problem with convex models to: 

min^llAx - 6II2 + A||<l>x||j-,T (6) 

or 
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Table 1: Measurement bounds for forest sparse data 



Sparse Models Measurement Bounds 



Standard Sparsity 


0{Tk 


+ 


Tk \og{N/k)) 


Joint Sparsity 


OiTk 


+ 


k\og{N/k)) 


Tree Sparsity 


0{Tk 


+ 


ThgiN/k)) 


Forest Sparsity 


OiTk 


+ 


\og{N/k)) 



min s.t. -\\Ax — b\\l < e (7) 

According to its definition, we should treat the coefficients in the same subforest simulta- 
neously (e.g. put them in the same group as £2,1 norm). However, in practical applications, 
we never know the sparse number k, nor the positions of non-zero elements. Then the size of 
forest subspaces can not be fixed. There are two ways to approximate the forest structure. 
One is to assign the coefficients at the same position across different trees in a group. And 
their parent coefficients are involved in the same group (Figure [T](c)). Similar approximation 
has been used in a tree sparsity algorithm [25j. The other one is to assign all coefficients 
of the same path among different trees in a group. For both cases, the original problem is 
transformed to the overlapping group sparsity problem. 

We validate these two assumptions on four types of practical data with forest sparsity 
(color images, multi-contrast MR images, MR images of multi-channel coils and multispectral 
image). The statistics results are shown in Figure |2} A wavelet coefficient is set to zero when 
its value is very small (if its energy is below 3% of the total energy). The rest are considered 
as non-zeros. From the figure, we could observe that most groups follow the forest structure 
assumptions. In addition, the jointly parent-child group implementation is preferred due to 
its less computational complexity. And it is also because this assumption is more close to 
the practical structure on the data. 

With the group configuration described above, the forest sparsity are implemented as 
overlapping group structures with £2,1 norm: 

min^WAx - b\\l + X\\G(l>x\\2,i (8) 

and 

min ||G$x||2,i s.t. ^\\Ax - b\\l < e (9) 
where £2,1 norm is defined as ||a;||2,i = X] ll^slb 9 ^ Q- means the components in one 
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color multi-constrast multispectral multi-coils 



Figure 2: Ratios of wavelet coefficients that follow the forest structure assumption. Two 
types of groups are considered: a) all the nodes from the root to the leaf on the same path 
across different trees; b) joint parent-child pairs across different trees (Figure [l] (c)). 



group and Q is the union of all groups. G is a binary matrix for group configuration. This 
kind of indexing matrix has been introduced in previous work YALLl [27j. The £2,1 norm 
encourages that coefficients in the same group should be consistent. It will be much harder 
for a whole path to be consistent than that for just a parent-child pair. This can further 
explain the selection of our approximation. With ([s]) and (|9]), the problem with forest sparse 
regularization would be easier now. 

We present an efficient algorithm based on FISTA |1] framework for problem ([s]). Sim- 
ilarly, problem ^ could be solved by NESTA [5], although we do not present the details 
in this paper. Both of FISTA and NESTA have the optimal convergence rate 0{l/i^) in 
function values for ffist order methods |23], where i is the iteration index. 

FISTA jl] is a convex approach that iteratively minimizes the lost function with the 
following form: 



minF(x) = f{x) + g{x) 



(10) 



where f{x) is a convex smooth function with Lipschitz constant Lj and g{x) is a convex 
but usually nonsmooth function. In each iteration of FISTA, is updated by: 



arg min{\g{x) 



Li 



(11) 



with Xr. 



[X 



i-l 



— -^V/(x* )). For problem (8|), if we let f{x) 



l\\Ax — b\\l and g{x) 



|2,i, there will be no closed form solution for ( 11 ). We introduce an auxiliary variable 



z to constrain The formulation then becomes to: 



min{^||v4a; - b\\l + XY^W^ah + ^Ik - G<!>x\\l} 



:i2) 



gee 
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where 7 is another positive parameter. We iteratively solve this alternative formula by 
minimizing x and z subproblems respectively. For the z subproblem: 



z, 



argmin{A||2;3||2 + -||% - {G^x)g\\l},g E G (13) 
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which has the closed form solution: 



= max(||r||2 - -,0)^^,c/ e ^ (14) 

where r = {G^x)g. We denote it as a shrinkgroup operation. 
For the x-subproblem: 

1 7 

X = argmin{-||i?x - fe||2 + Ik - ^^3:112} (15) 
^2 2 

The optimal solution is a; = {R^R + X^^G^^G)^^{R^b + ^^G^z), which contains a large 
scale inverse problem. To avoid it, we can apply FISTA to approximate the solution. More- 
over, the compared algorithms are of the type FISTA. This could demonstrate the benefit of 
forest sparsity clearly because the solvers are the same. Let /(x) = l\\Rx-b\\l + ^\\z-G<!>x\\l 
and g{x) = 0. Supposing its Lipschitz constant to be Lj, then the whole algorithm can be 
written in Algorithm [1} 



Algorithm 1 FISTA_Forest 

Input: p = 1/L/, = = 1, A, 7, /c = 1 

while not meet the stopping criterion do 

z = shrinkgroup{G^x''~^ , A/7) 

X* = r* — pV f{r^) 

f+i = [1 + v/1 + 4(tO']/2 

i = i + l 

end while 



Note that this algorithm could solve the combined model with both forest sparsity and 
total variation. We only need to let g{x) = \ \x\\tv for the x-subproblem. 



5 Application 

5.1 Multi-contrast MRI 

Multi-contrast MRI is a common technique to aid clinical diagnosis. For example Tl 
weighted MR images could well distinguish fat from water, with water appearing darker 
and fat brighter. Whereas in T2 weighted images fat is darker and water is lighter, which is 
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well suited to imaging edema. Although with different intensities, T1/T2 or proton-density 
weighted MR images are scanned at the same anatomical position. Therefore they are not 
independent but highly correlated. Suppose {xs}'^=i € R" are the multi-contrast images for 
the same anatomical cross section and {bs}^=i G R™^ are the corresponding undersampled 
data in Fourier domain, then the forest sparse reconstruction can be formulated as: 

T 

X = argmin ||$X||^,r; Yl W^^^i-^ ^) - ^^f ^ (16) 

s=l 

where X = [xi, ...,Xt] and Rg is the measurement matrix for the image Xs- Figure [l] shows 
an example of the forest structure in multi-contrast MR images. 

5.2 pMRI 

To improve the scanning speed of MRI, an efficient and feasible way is to acquire the data in 
parallel with multi-channel coils. The scanning time depends on the number of measurements 
in Fourier domain, and it will be significantly reduced when each coil only samples a small 
partial of the whole measurements. The bottleneck is how to reconstruct the original MRI 
image efficiently and precisely. This issue is called pMRI in literature. Sparsity techniques 
have been used to improve the classical method SENSE [24j. However, when the coil sensi- 
tivity can not be estimated precisely, the final image would contain visual artifacts. Unlike 
previous CS-SENSE |19] which reconstruct the images of multi-coils individually, CaLM- 
MRI |22j recovers the multi-coils images jointly by assuming the data is jointly sparse. The 
problem can be written as: 

as = argmin 1 1 as 1 1 2,1 s.t.\\y - EwaslU < ^ (17) 

where y = [yi,y2, ■■■Uc]'^ is the concatenation of the undersampled data from each coil; C 
denotes the number of coils; E^r = diag{RW'^ , RW^ , RW'^); R is the partial Fourier 
transform and W'^ is the inverse wavelet transform. When the wavelet coefficients as = 
[ai, a2, ...ac]"^ are reconstructed, the MR images can be obtained by the inverse wavelet 
transform. 

The final result of CaLM-MRI is achieved by a sum of square of these images without 
coil sensitivity and SENSE encoding. It shows comparable results with those methods which 
need precise coil configuration. As shown in Figure [3} there is only little difference among 
images obtained from multi-coils. This method can be improved with forest sparsity, since 
the images all follow the forest sparsity assumption. 

5.3 Color Images 

Color images captured by common camera can be represented as combinations of red, green, 
blue three colors. Different colors synthesized by these three colors seems realistic to human 
eyes. By observing the color channels are highly correlated, group prior is utilized in recent 
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(a) (b) (c) (d) 

Figure 3: Multi-coils MRI images 

recovery [2!^. Modeling with £2,1 norm regularization can gain additional SNR to standard 
ii norm regularization. Further more, each color channel tends to be wavelet tree sparse. If 
we model the problem with forest sparsity, this result would be reasonably better. 

5.4 Multispectral Images 

Different from common color images, a multispectral or hyperspectral image is consisted of 
much more bands, which provides both spatial and spectral representations of scenes. It is 
widely utilized on remote sensing with applications to agriculture, environment detection etc. 
However, the collection of large amount of data costs both huge imaging time and storage 
space. By compressive sensing data acquisition, the cost of imaging for remote sensing data 
could be significantly reduced [2DJ. Like RGB images, the bands of multispectral image 
should represent the same scene. Each band has tree sparsity property. Therefore, they 
follow the forest sparse assumption. 

6 Experiments 

We conduct experiments on the RGB color image, multi-contrast MR images, MR images of 
multi-channel coils and the multispectral image to validate the benefit of forest sparsity. All 
experiments are conducted on a desktop with 3.4GHz Intel core 17 3770 CPU. Matlab version 
is 7.8(2009a). All measurements are mixed with 0.01 Gaussian white noise. Signal-to-Noise 
Ratio (SNR) is used as the metric for evaluations. 

6.1 Multi-contrast MRI 

Multi-contrast MR images are typically forest sparse under the wavelet basis. The data 
is extracted from the SRI24 Multi-Channel Brain Atlas Dataset. We compare the pro- 
posed algorithm with the recent fastest algorithm FCSA p3] on CS-MRI, and its extension 
FCSA_MT [13] on the images in Figure |4j The sampling matrix ( shown in Figure |4](d)) is 
a partial Fourier transform and we sample more on the low frequency. The total variation 
term in both FCSA_MT and FCSA are removed. It is because we just want to compare 
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(a) (b) (c) (d) 

Figure 4: (a)-(c) the original multi-constrast images and (d) the samphng mask. 



the forest sparsity with standard sparsity and joint sparsity more clearly. Then FCSA and 
FCSA_MT can be viewed as £i version and £2,1 version of FISTA. The parameter A is set 
0.035 as that in previous experiments, and 7 is set to 0.5A. We run each algorithm 400 
iterations. 
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FISTA_Tree 
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(a) 



(b) 



Figure 5: The performance among FCSA, FCSA_MT, FISTA_Tree and FISTA_Forest. 
The sampling ratio is (a) 20% and their SNR are 27.57, 29.28, 28.75, 30.17 respectively; (b) 
25% and their SNR are 29.73, 32.20, 31.11, 32.83. 



Figure [5] shows the performance of different algorithms without total variation regulariza- 
tion. From the figure, we could observe that modeling with forest sparsity achieves the best 
SNR after convergence. If all algorithms are with total variation regularization, the proposed 
algorithm is still better. It is reasonable because the forest sparsity model utilize all priors 
for reconstruction. Except total variation, FCSA only assumes the data is standardly sparse, 
but there is no more restriction for the randomness of the support set. FCSA_MT assumes 
the images are jointly sparse and models them with £2,1 norm regularization. However, the 
tree structure is not exploited. If the tree structure is utilized independently on each image 
but not correlated, the performance is a little worse than FCSA_MT on this data. The for- 
est model encourages the data to be forest sparse, which is more accurate than joint sparse 
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model and far more accurate than standard sparse mode. Therefore, it always obtains the 
highest SNR at any time. 

6.2 parallel MRI 

There are two steps for compressive sensing pMRI reconstruction in CaLM-MRI |22]- First, 
the images are recovered from the undersampled Fourier signals of different coil channels by 
sparse regularization. In the second step, it does not rely on coil profiles but synthesizes 
these images with the sum-of-square procedure. We compare our algorithm with SPGLl [6j 
which solves the joint £2,1 norm problem in CaLM-MRI. Both algorithms run enough time 
to ensure convergence. 

Table 2: Comparisons of SNRs on different sampling ratios 



sampling ratios 


25% 


20% 


17% 


15% 


SPGLl (step 1) 


26.95 


24.73 


23.06 


22.21 


Forest(step 1) 


27.47 


25.22 


23.37 


22.59 


SPGLl (step 2) 


18.61 


18.24 


16.54 


15.94 


Forest (step 2) 


22.59 


21.52 


18.59 


17.72 



Table 3: Comparisons of SNRs on different number of coils 



number of coils 


2 


4 


6 


8 


SPGLl(step 1) 


23.42 


24.71 


24.89 


25.23 


Forest (step 1) 


24.25 


25.12 


25.29 


25.51 


SPGLl(step 2) 


20.88 


15.71 


20.21 


20.73 


Forest (step 2) 


21.10 


21.39 


22.22 


21.46 



Table [2] and Table [3] show all the SNRs of images obtained in the first step and the second 
step on various sampling ratios, and various numbers of coils for pMRI simulation. For the 
same algorithm, more measurements or more number of coils tend to increase the SNRs 
of images obtained in step 1, although no guaranteed improvement is achieved in the final 
image. Another observation is that the model of forest sparsity is always better than joint 
sparsity due to the tree structure is utilized. The data is not only assumed to be joint sparse 
but "joint" tree sparse. More importantly , it is very convenient to combine total variation 
penalty in the proposed algorithm, but unknown how to do it in SPGLl. 
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6.3 Color Image Reconstruction 



For color images, we compare our algorithm with FISTA_£i, FISTA_£2,i and FISTA_Tree. 
Figure |6] shows the visual results recovered by different sparse penalties. Only after 50 
iterations, the image recovered by our algorithm is very close to the original one with the 
fewest artifacts. 




(d) 



Figure 6: Visual comparisons on the lena image reconstruction after 50 iterations with 
about 20% sampling, (a) the original image and the patch detail; (b) recovered by standard 
sparsity; (c) recovered by group sparsity; (d) the patch details for each recovered image; (e) 
recovered by tree sparsity; (f) recovered by forest sparsity. Their SNRs are 16.65, 17.41, 
17.66 and 18.92, respectively. 



6.4 Multispectral Image Reconstruction 

For multispectral image, we test a dataset of 1992 AVIRIS image Indian Pine Test Site 3 
It is a 2 X 2 mile portion of Northwest Tippecanoe County of Indiana (Figure [7|. There 
are total 220 bands and we only conducted experiment on the 6-th to 14-th bands. Every 3 
bands are reconstructed simultaneously by joint sparse model and forest sparse model. Each 
image is cut to 128 x 128 for convenience. And the number of wavelet decomposition levels 
is set to 3. The SNR of all recovered images are shown in Figure [8j One could observe that 
modeling with forest sparsity always achieves the highest SNRs. 



^The data is downloaded from https : //engineer ing.purdue . edu/~biehl/MultiSpec/h3rperspectral 
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Figure 7: The original multispectral image: band 6 to band 14. 
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Figure 8: Multispectral image reconstruction results by different sparse models with about 
20% sampling. 
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7 Conclusion 



In this paper, we have proposed a novel model forest sparsity for sparse learning and com- 
pressive sensing. This model enriches the family of structured sparsity and can be widely 
applied on numerous fields of sparse regularization problems. The benefit of the proposed 
model has been theoretically proved and empirically validated in practical applications. Un- 
der compressive sensing assumptions, significant reduction of measurements is achieved with 
forest sparsity compared with standard sparsity, group sparsity or independent tree sparsity. 
A new algorithm is developed to efficiently solve the forest sparsity problem. While applying 
it on practical problems such as multi-contrast MRI, pMRI, multispectral image and color 
image reconstruction, extensive experimental results demonstrate the superiority of forest 
sparsity over standard sparsity, group sparsity and tree sparsity in terms of both accuracy 
and computational complexity. 

APPENDIX 



A Proof of Lemma 1 

First, we need to figure out the number of subtrees (size k) of a binary tree (size A^). Note 
that the root of the subtrees should be the binary tree's root. 

Case 1: when k < Llog2 A^J, the number of subtrees of size k is just the Catalan number: 



2k\ ^ (2e)'= ^ e'^N 



k + l\ kj~k + l~k + l 



Case 2: when k > \\0g2 N\ , the number of subtrees of size k should follow [2j: 

^4% 6 \og,N , 128 , 



k a/^ Llog2 k\ ) e2 [log2 k\ 
4\cilog2A^ c2 



k Llog2 k\ ) [\og2 k\ 

k log2 k 

4^(c4Ar) 



< 



< 

k 

where Ci, C2, C3, C4 are some constants. Then we have: 

for k< [log, N\ 
for k > [log2 A^J 



T ^ < ) k+l 
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According to Theorem 1: 



2 12 
m > — (ln(2L) + A;ln— +t) (19) 

CO 



With (19), the number of measurements should satisfy: 



c& 



(In 2 + A; + hi{N/{k + l)) + k ln(12/(5) + t) 



|(ln 2 + A; In 4 + ln(c4Ar/A;) + k ln(12 /5) + t) ^ > 

for k > [log2 A^J 

For both cases, we have m = 0{k + \og{N/k)) as the minimum number of measurements. 
This bound also has been proved in previous papers [2j [15j . 

B Proof of Lemma 2 



From Lemma 1, the number of subtrees for a /c-sparse is (19). Suppose there are T trees 
and they are independent, then the number of combinations should be (Lf)'^. Therefore, 
the number of measurements have to satisfy: 

{^(ln2 + Tk + T\n{N/{k + 1)) + Tk\n{12/6) + t) 
for k< [log, N\ 
^{\n2 + Tk\n4: + T\n{ciN/k) +Tk\n{12/5) +t) ^ ' 

for k > [logs 

For both cases, the required bound is m = 0{Tk + Tlog(A^/A;)). This also can be obtained 
by T X 0{k + \og{N/k)) = 0{Tk + T\og{N/k)). 

C Proof of Lemma 3 

If the data is forest sparse, the support set of different trees are dependent. It means that 
the non-zero subtree on each tree should be the same as others. Then, the number of 
combinations is still L-j-. 

{^(In2 + k + \n{N/{k + 1)) + Tkln{12/6) + t) 
for k< [log, N\ 
^(ln2 + A;ln4 + ln((c4Ar)/A;) + Tkln{12/6) + t) ^ ' 

for k > [logs 

For both cases, the bound is reduced to m = 0{Tk + log(A^/A;)). 
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